Overview

Dataset statistics

Number of variables34
Number of observations19997
Missing cells8844
Missing cells (%)1.3%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory5.2 MiB
Average record size in memory272.0 B

Variable types

NUM13
CAT13
BOOL5
DATE2
UNSUPPORTED1

Reproduction

Analysis started2020-09-06 14:49:24.596157
Analysis finished2020-09-06 14:50:24.713062
Duration1 minute and 0.12 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

first_name has a high cardinality: 2839 distinct values High cardinality
last_name has a high cardinality: 3267 distinct values High cardinality
job_title has a high cardinality: 195 distinct values High cardinality
address has a high cardinality: 3487 distinct values High cardinality
Age Scale is highly correlated with AgeHigh correlation
Age is highly correlated with Age ScaleHigh correlation
online_order has 360 (1.8%) missing values Missing
last_name has 642 (3.2%) missing values Missing
DOB has 446 (2.2%) missing values Missing
job_title has 2394 (12.0%) missing values Missing
job_industry_category has 3229 (16.1%) missing values Missing
tenure has 446 (2.2%) missing values Missing
transaction_id has unique values Unique
DOB is an unsupported type, check if it needs cleaning or further analysis Unsupported
product_id has 1375 (6.9%) zeros Zeros

Variables

transaction_id
Real number (ℝ≥0)

UNIQUE

Distinct count19997
Unique (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean9999.856078411762
Minimum1
Maximum20000
Zeros0
Zeros (%)0.0%
Memory size156.2 KiB
2020-09-06T10:50:24.884119image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1000.8
Q15000
median10000
Q314999
95-th percentile19000.2
Maximum20000
Range19999
Interquartile range (IQR)9999

Descriptive statistics

Standard deviation5773.636854
Coefficient of variation (CV)0.5773719951
Kurtosis-1.199948114
Mean9999.856078
Median Absolute Deviation (MAD)5000
Skewness0.0001487462264
Sum199967122
Variance33334882.52
2020-09-06T10:50:25.059407image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
20471< 0.1%
 
109121< 0.1%
 
129471< 0.1%
 
27081< 0.1%
 
6611< 0.1%
 
68061< 0.1%
 
47591< 0.1%
 
191001< 0.1%
 
170531< 0.1%
 
88651< 0.1%
 
Other values (19987)1998799.9%
 
ValueCountFrequency (%) 
11< 0.1%
 
21< 0.1%
 
31< 0.1%
 
41< 0.1%
 
51< 0.1%
 
ValueCountFrequency (%) 
200001< 0.1%
 
199991< 0.1%
 
199981< 0.1%
 
199971< 0.1%
 
199961< 0.1%
 

product_id
Real number (ℝ≥0)

ZEROS

Distinct count101
Unique (%)0.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean45.37145571835775
Minimum0
Maximum100
Zeros1375
Zeros (%)6.9%
Memory size156.2 KiB
2020-09-06T10:50:25.212223image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q118
median44
Q372
95-th percentile94
Maximum100
Range100
Interquartile range (IQR)54

Descriptive statistics

Standard deviation30.75087583
Coefficient of variation (CV)0.6777581929
Kurtosis-1.247667283
Mean45.37145572
Median Absolute Deviation (MAD)27
Skewness0.08167230031
Sum907293
Variance945.6163646
2020-09-06T10:50:25.384057image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
013756.9%
 
33541.8%
 
13111.6%
 
352681.3%
 
382671.3%
 
42411.2%
 
22401.2%
 
902251.1%
 
122241.1%
 
802231.1%
 
Other values (91)1626981.4%
 
ValueCountFrequency (%) 
013756.9%
 
13111.6%
 
22401.2%
 
33541.8%
 
42411.2%
 
ValueCountFrequency (%) 
1001300.7%
 
991520.8%
 
981560.8%
 
971420.7%
 
961610.8%
 

customer_id
Real number (ℝ≥0)

Distinct count3493
Unique (%)17.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1737.7516127419112
Minimum1
Maximum3500
Zeros0
Zeros (%)0.0%
Memory size156.2 KiB
2020-09-06T10:50:25.540271image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile172
Q1857
median1736
Q32613
95-th percentile3320
Maximum3500
Range3499
Interquartile range (IQR)1756

Descriptive statistics

Standard deviation1011.221384
Coefficient of variation (CV)0.5819136499
Kurtosis-1.203543648
Mean1737.751613
Median Absolute Deviation (MAD)878
Skewness0.008740971946
Sum34749819
Variance1022568.687
2020-09-06T10:50:25.728349image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
2183140.1%
 
1068140.1%
 
2476140.1%
 
2072130.1%
 
637130.1%
 
1672130.1%
 
1946130.1%
 
3232130.1%
 
1140130.1%
 
2912130.1%
 
Other values (3483)1986499.3%
 
ValueCountFrequency (%) 
1110.1%
 
23< 0.1%
 
38< 0.1%
 
42< 0.1%
 
56< 0.1%
 
ValueCountFrequency (%) 
35006< 0.1%
 
34997< 0.1%
 
34986< 0.1%
 
34973< 0.1%
 
34964< 0.1%
 
Distinct count364
Unique (%)1.8%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
Minimum2017-01-01 00:00:00
Maximum2017-12-30 00:00:00
2020-09-06T10:50:26.350985image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:26.718005image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
Histogram

online_order
Boolean

MISSING

Distinct count2
Unique (%)< 0.1%
Missing360
Missing (%)1.8%
Memory size156.2 KiB
1
9829
0
9808
(Missing)
 
360
ValueCountFrequency (%) 
1982949.2%
 
0980849.0%
 
(Missing)3601.8%
 

order_status
Categorical

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
Approved
19818
Cancelled
 
179
ValueCountFrequency (%) 
Approved1981899.1%
 
Cancelled1790.9%
 
2020-09-06T10:50:27.030427image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Length

Max length9
Median length8
Mean length8.008951343
Min length8

brand
Categorical

Distinct count6
Unique (%)< 0.1%
Missing197
Missing (%)1.0%
Memory size156.2 KiB
Solex
4252
Giant Bicycles
3312
WeareA2B
3295
OHM Cycles
3042
Trek Bicycles
2990
ValueCountFrequency (%) 
Solex425221.3%
 
Giant Bicycles331216.6%
 
WeareA2B329516.5%
 
OHM Cycles304215.2%
 
Trek Bicycles299015.0%
 
Norco Bicycles290914.5%
 
(Missing)1971.0%
 
2020-09-06T10:50:27.265904image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Length

Max length14
Median length10
Mean length10.23128469
Min length3

product_line
Categorical

Distinct count4
Unique (%)< 0.1%
Missing197
Missing (%)1.0%
Memory size156.2 KiB
Standard
14175
Road
3968
Touring
 
1234
Mountain
 
423
ValueCountFrequency (%) 
Standard1417570.9%
 
Road396819.8%
 
Touring12346.2%
 
Mountain4232.1%
 
(Missing)1971.0%
 
2020-09-06T10:50:27.500224image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Length

Max length8
Median length8
Mean length7.095314297
Min length3

product_class
Categorical

Distinct count3
Unique (%)< 0.1%
Missing197
Missing (%)1.0%
Memory size156.2 KiB
medium
13823
high
3013
low
2964
ValueCountFrequency (%) 
medium1382369.1%
 
high301315.1%
 
low296414.8%
 
(Missing)1971.0%
 
2020-09-06T10:50:27.748568image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Length

Max length6
Median length6
Mean length5.224433665
Min length3

product_size
Categorical

Distinct count3
Unique (%)< 0.1%
Missing197
Missing (%)1.0%
Memory size156.2 KiB
medium
12987
large
3976
small
2837
ValueCountFrequency (%) 
medium1298764.9%
 
large397619.9%
 
small283714.2%
 
(Missing)1971.0%
 
2020-09-06T10:50:27.986024image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Length

Max length6
Median length6
Mean length5.629744462
Min length3

list_price
Real number (ℝ≥0)

Distinct count296
Unique (%)1.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1107.919640946142
Minimum12.01
Maximum2091.47
Zeros0
Zeros (%)0.0%
Memory size156.2 KiB
2020-09-06T10:50:28.148354image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Quantile statistics

Minimum12.01
5-th percentile100.35
Q1575.27
median1163.89
Q31635.3
95-th percentile1992.93
Maximum2091.47
Range2079.46
Interquartile range (IQR)1060.03

Descriptive statistics

Standard deviation582.8187868
Coefficient of variation (CV)0.5260478877
Kurtosis-1.083023045
Mean1107.919641
Median Absolute Deviation (MAD)521.58
Skewness-0.1260900774
Sum22155069.06
Variance339677.7383
2020-09-06T10:50:28.315470image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
2091.474652.3%
 
1403.53962.0%
 
71.492741.4%
 
1231.152351.2%
 
1890.392331.2%
 
1129.132321.2%
 
1073.072291.1%
 
1894.192281.1%
 
945.042261.1%
 
574.642231.1%
 
Other values (286)1725686.3%
 
ValueCountFrequency (%) 
12.011951.0%
 
16.081< 0.1%
 
26.151< 0.1%
 
32.441< 0.1%
 
36.781< 0.1%
 
ValueCountFrequency (%) 
2091.474652.3%
 
2086.071< 0.1%
 
2083.942081.0%
 
2076.811< 0.1%
 
2064.081< 0.1%
 

standard_cost
Real number (ℝ≥0)

Distinct count100
Unique (%)0.5%
Missing197
Missing (%)1.0%
Infinite0
Infinite (%)0.0%
Mean556.0680474747475
Minimum7.21
Maximum1759.85
Zeros0
Zeros (%)0.0%
Memory size156.2 KiB
2020-09-06T10:50:28.456064image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Quantile statistics

Minimum7.21
5-th percentile53.62
Q1215.14
median507.58
Q3795.1
95-th percentile1479.11
Maximum1759.85
Range1752.64
Interquartile range (IQR)579.96

Descriptive statistics

Standard deviation405.9768807
Coefficient of variation (CV)0.7300848926
Kurtosis0.2867003197
Mean556.0680475
Median Absolute Deviation (MAD)287.52
Skewness0.864008653
Sum11010147.34
Variance164817.2277
2020-09-06T10:50:28.633571image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
388.924652.3%
 
954.823962.0%
 
53.622741.4%
 
161.62351.2%
 
260.142331.2%
 
677.482321.2%
 
933.842291.1%
 
598.762281.1%
 
507.582261.1%
 
459.712231.1%
 
Other values (90)1705985.3%
 
ValueCountFrequency (%) 
7.211951.0%
 
13.441870.9%
 
44.711981.0%
 
45.261880.9%
 
53.622741.4%
 
ValueCountFrequency (%) 
1759.851951.0%
 
1610.92001.0%
 
1580.471901.0%
 
1531.421690.8%
 
1516.131850.9%
 

Margin
Real number (ℝ≥0)

Distinct count296
Unique (%)1.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean557.329685452818
Minimum4.8
Maximum2086.07
Zeros0
Zeros (%)0.0%
Memory size156.2 KiB
2020-09-06T10:50:28.808376image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Quantile statistics

Minimum4.8
5-th percentile25.09
Q1135.85
median445.21
Q3830.24
95-th percentile1612.25
Maximum2086.07
Range2081.27
Interquartile range (IQR)694.39

Descriptive statistics

Standard deviation497.2945398
Coefficient of variation (CV)0.8922807322
Kurtosis-0.4062154187
Mean557.3296855
Median Absolute Deviation (MAD)330.28
Skewness0.8451109374
Sum11144921.72
Variance247301.8593
2020-09-06T10:50:29.056129image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1702.554652.3%
 
448.683962.0%
 
17.872741.4%
 
1069.552351.2%
 
1630.252331.2%
 
451.652321.2%
 
139.232291.1%
 
1295.432281.1%
 
437.462261.1%
 
114.932231.1%
 
Other values (286)1725686.3%
 
ValueCountFrequency (%) 
4.81951.0%
 
14.231630.8%
 
15.081880.9%
 
16.081< 0.1%
 
17.872741.4%
 
ValueCountFrequency (%) 
2086.071< 0.1%
 
2076.811< 0.1%
 
2064.081< 0.1%
 
2062.951< 0.1%
 
2061.381< 0.1%
 
Distinct count100
Unique (%)0.5%
Missing197
Missing (%)1.0%
Memory size156.2 KiB
Minimum1991-01-21 00:00:00
Maximum2016-12-06 00:00:00
2020-09-06T10:50:29.397839image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:29.789937image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
Histogram

first_name
Categorical

HIGH CARDINALITY

Distinct count2839
Unique (%)14.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
Corabelle
 
36
Tobe
 
31
Emlyn
 
29
Lindsay
 
28
Gar
 
26
Other values (2834)
19847
ValueCountFrequency (%) 
Corabelle360.2%
 
Tobe310.2%
 
Emlyn290.1%
 
Lindsay280.1%
 
Gar260.1%
 
Max260.1%
 
Catie260.1%
 
Keeley250.1%
 
Hubie240.1%
 
Ebba240.1%
 
Other values (2829)1972298.6%
 
2020-09-06T10:50:30.083541image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Length

Max length14
Median length6
Mean length5.960194029
Min length2

last_name
Categorical

HIGH CARDINALITY
MISSING

Distinct count3267
Unique (%)16.9%
Missing642
Missing (%)3.2%
Memory size156.2 KiB
Gladman
 
24
Fyndon
 
23
Leek
 
18
Elgey
 
18
Ramsdell
 
18
Other values (3262)
19254
ValueCountFrequency (%) 
Gladman240.1%
 
Fyndon230.1%
 
Leek180.1%
 
Elgey180.1%
 
Ramsdell180.1%
 
Creebo180.1%
 
Mulliner170.1%
 
Alpes170.1%
 
Lithgow170.1%
 
Pristnor170.1%
 
Other values (3257)1916895.9%
 
(Missing)6423.2%
 
2020-09-06T10:50:30.320300image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Length

Max length19
Median length7
Mean length6.889983498
Min length2
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
0
10549
1
9448
ValueCountFrequency (%) 
01054952.8%
 
1944847.2%
 
Distinct count100
Unique (%)0.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean48.772465869880484
Minimum0
Maximum99
Zeros188
Zeros (%)0.9%
Memory size156.2 KiB
2020-09-06T10:50:30.493166image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile5
Q124
median48
Q373
95-th percentile95
Maximum99
Range99
Interquartile range (IQR)49

Descriptive statistics

Standard deviation28.59825009
Coefficient of variation (CV)0.5863605536
Kurtosis-1.17658005
Mean48.77246587
Median Absolute Deviation (MAD)25
Skewness0.05797703879
Sum975303
Variance817.859908
2020-09-06T10:50:30.647657image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
162911.5%
 
802731.4%
 
482571.3%
 
202561.3%
 
22561.3%
 
672551.3%
 
132541.3%
 
832501.3%
 
192501.3%
 
532501.3%
 
Other values (90)1740587.0%
 
ValueCountFrequency (%) 
01880.9%
 
11760.9%
 
22561.3%
 
31330.7%
 
41780.9%
 
ValueCountFrequency (%) 
992141.1%
 
982461.2%
 
972161.1%
 
962291.1%
 
951410.7%
 

DOB
Unsupported

MISSING
REJECTED
UNSUPPORTED

Missing446
Missing (%)2.2%
Memory size156.4 KiB

Age
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count57
Unique (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean44.31814772215832
Minimum18
Maximum150
Zeros0
Zeros (%)0.0%
Memory size156.2 KiB
2020-09-06T10:50:30.803872image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Quantile statistics

Minimum18
5-th percentile22
Q133
median43
Q353
95-th percentile65
Maximum150
Range132
Interquartile range (IQR)20

Descriptive statistics

Standard deviation17.06849624
Coefficient of variation (CV)0.3851355961
Kurtosis6.969850922
Mean44.31814772
Median Absolute Deviation (MAD)10
Skewness1.891374652
Sum886230
Variance291.3335639
2020-09-06T10:50:30.980808image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
4213496.7%
 
439054.5%
 
467053.5%
 
446643.3%
 
416313.2%
 
455973.0%
 
405943.0%
 
395262.6%
 
474982.5%
 
344942.5%
 
Other values (47)1303465.2%
 
ValueCountFrequency (%) 
181040.5%
 
191450.7%
 
202421.2%
 
213591.8%
 
223331.7%
 
ValueCountFrequency (%) 
1509< 0.1%
 
1204462.2%
 
88100.1%
 
855< 0.1%
 
793< 0.1%
 

Age Scale
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count57
Unique (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.29545431814772216
Minimum0.12
Maximum1.0
Zeros0
Zeros (%)0.0%
Memory size156.2 KiB
2020-09-06T10:50:31.131150image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Quantile statistics

Minimum0.12
5-th percentile0.1466666667
Q10.22
median0.2866666667
Q30.3533333333
95-th percentile0.4333333333
Maximum1
Range0.88
Interquartile range (IQR)0.1333333333

Descriptive statistics

Standard deviation0.1137899749
Coefficient of variation (CV)0.3851355961
Kurtosis6.969850922
Mean0.2954543181
Median Absolute Deviation (MAD)0.06666666667
Skewness1.891374652
Sum5908.2
Variance0.0129481584
2020-09-06T10:50:31.308453image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.2813496.7%
 
0.28666666679054.5%
 
0.30666666677053.5%
 
0.29333333336643.3%
 
0.27333333336313.2%
 
0.35973.0%
 
0.26666666675943.0%
 
0.265262.6%
 
0.31333333334982.5%
 
0.22666666674942.5%
 
Other values (47)1303465.2%
 
ValueCountFrequency (%) 
0.121040.5%
 
0.12666666671450.7%
 
0.13333333332421.2%
 
0.143591.8%
 
0.14666666673331.7%
 
ValueCountFrequency (%) 
19< 0.1%
 
0.84462.2%
 
0.5866666667100.1%
 
0.56666666675< 0.1%
 
0.52666666673< 0.1%
 

job_title
Categorical

HIGH CARDINALITY
MISSING

Distinct count195
Unique (%)1.1%
Missing2394
Missing (%)12.0%
Memory size156.2 KiB
Social Worker
 
226
Legal Assistant
 
221
Business Systems Development Analyst
 
221
Assistant Professor
 
212
Executive Secretary
 
208
Other values (190)
16515
ValueCountFrequency (%) 
Social Worker2261.1%
 
Legal Assistant2211.1%
 
Business Systems Development Analyst2211.1%
 
Assistant Professor2121.1%
 
Executive Secretary2081.0%
 
Internal Auditor2071.0%
 
Nuclear Power Engineer2051.0%
 
Tax Accountant1971.0%
 
Administrative Officer1961.0%
 
Chemical Engineer1951.0%
 
Other values (185)1551577.6%
 
(Missing)239412.0%
 
2020-09-06T10:50:31.556824image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Length

Max length36
Median length17
Mean length16.40761114
Min length3

job_industry_category
Categorical

MISSING

Distinct count9
Unique (%)0.1%
Missing3229
Missing (%)16.1%
Memory size156.2 KiB
Manufacturing
4014
Financial Services
3886
Health
3099
Retail
1758
Property
1297
Other values (4)
2714
ValueCountFrequency (%) 
Manufacturing401420.1%
 
Financial Services388619.4%
 
Health309915.5%
 
Retail17588.8%
 
Property12976.5%
 
IT10845.4%
 
Entertainment6983.5%
 
Argiculture5782.9%
 
Telecommunications3541.8%
 
(Missing)322916.1%
 
2020-09-06T10:50:31.882126image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Length

Max length18
Median length8
Mean length9.766815022
Min length2
Distinct count10
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-0.41421213181977296
Minimum-5
Maximum5
Zeros0
Zeros (%)0.0%
Memory size156.2 KiB
2020-09-06T10:50:32.061671image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Quantile statistics

Minimum-5
5-th percentile-5
Q1-3
median1
Q32
95-th percentile4
Maximum5
Range10
Interquartile range (IQR)5

Descriptive statistics

Standard deviation2.976688795
Coefficient of variation (CV)-7.186387279
Kurtosis-1.443477946
Mean-0.4142121318
Median Absolute Deviation (MAD)3
Skewness-0.003231128853
Sum-8283
Variance8.860676181
2020-09-06T10:50:32.226305image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1401420.1%
 
-3388619.4%
 
2322916.1%
 
-4309915.5%
 
417588.8%
 
312976.5%
 
-510845.4%
 
-26983.5%
 
-15782.9%
 
53541.8%
 
ValueCountFrequency (%) 
-510845.4%
 
-4309915.5%
 
-3388619.4%
 
-26983.5%
 
-15782.9%
 
ValueCountFrequency (%) 
53541.8%
 
417588.8%
 
312976.5%
 
2322916.1%
 
1401420.1%
 
Distinct count3
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
-1
10141
1
5049
0
4807
ValueCountFrequency (%) 
-11014150.7%
 
1504925.2%
 
0480724.0%
 
2020-09-06T10:50:32.455594image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Length

Max length2
Median length2
Mean length1.507126069
Min length1
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
N
19989
Y
 
8
ValueCountFrequency (%) 
N19989> 99.9%
 
Y8< 0.1%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
0
10012
1
9985
ValueCountFrequency (%) 
01001250.1%
 
1998549.9%
 

tenure
Real number (ℝ≥0)

MISSING

Distinct count22
Unique (%)0.1%
Missing446
Missing (%)2.2%
Infinite0
Infinite (%)0.0%
Mean10.683238709017441
Minimum1.0
Maximum22.0
Zeros0
Zeros (%)0.0%
Memory size156.2 KiB
2020-09-06T10:50:32.626645image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2
Q16
median11
Q315
95-th percentile20
Maximum22
Range21
Interquartile range (IQR)9

Descriptive statistics

Standard deviation5.676402887
Coefficient of variation (CV)0.5313372697
Kurtosis-1.069951904
Mean10.68323871
Median Absolute Deviation (MAD)5
Skewness0.04361215577
Sum208868
Variance32.22154974
2020-09-06T10:50:32.789639image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
711906.0%
 
510965.5%
 
1110965.5%
 
1610675.3%
 
1210605.3%
 
810325.2%
 
1410195.1%
 
99955.0%
 
179854.9%
 
109854.9%
 
Other values (12)902645.1%
 
ValueCountFrequency (%) 
18764.4%
 
27363.7%
 
38194.1%
 
49294.6%
 
510965.5%
 
ValueCountFrequency (%) 
222551.3%
 
212751.4%
 
204982.5%
 
198374.2%
 
189594.8%
 

address
Categorical

HIGH CARDINALITY

Distinct count3487
Unique (%)17.5%
Missing29
Missing (%)0.1%
Memory size156.2 KiB
3 Talisman Place
 
14
8142 Tomscot Drive
 
14
567 Scott Park
 
14
3 Mariners Cove Terrace
 
14
4297 Emmet Lane
 
14
Other values (3482)
19898
ValueCountFrequency (%) 
3 Talisman Place140.1%
 
8142 Tomscot Drive140.1%
 
567 Scott Park140.1%
 
3 Mariners Cove Terrace140.1%
 
4297 Emmet Lane140.1%
 
8587 Graceland Way130.1%
 
7916 Clyde Gallagher Place130.1%
 
3126 Butterfield Pass130.1%
 
259 Barnett Crossing130.1%
 
1 Nevada Park130.1%
 
Other values (3477)1983399.2%
 
(Missing)290.1%
 
2020-09-06T10:50:33.037247image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Length

Max length29
Median length18
Mean length17.68090214
Min length3

postcode
Real number (ℝ≥0)

Distinct count835
Unique (%)4.2%
Missing29
Missing (%)0.1%
Infinite0
Infinite (%)0.0%
Mean2987.623347355769
Minimum2000.0
Maximum4883.0
Zeros0
Zeros (%)0.0%
Memory size156.2 KiB
2020-09-06T10:50:33.196752image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Quantile statistics

Minimum2000
5-th percentile2047
Q12200
median2767
Q33754
95-th percentile4551
Maximum4883
Range2883
Interquartile range (IQR)1554

Descriptive statistics

Standard deviation851.3066466
Coefficient of variation (CV)0.284944435
Kurtosis-0.918265722
Mean2987.623347
Median Absolute Deviation (MAD)597
Skewness0.6261828591
Sum59656863
Variance724723.0066
2020-09-06T10:50:33.358620image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
21531690.8%
 
27701460.7%
 
21701400.7%
 
21551360.7%
 
39771280.6%
 
27631250.6%
 
21451250.6%
 
20651170.6%
 
27601120.6%
 
22611090.5%
 
Other values (825)1866193.3%
 
ValueCountFrequency (%) 
2000410.2%
 
2007130.1%
 
20087< 0.1%
 
2009270.1%
 
2010570.3%
 
ValueCountFrequency (%) 
48839< 0.1%
 
4879110.1%
 
4878120.1%
 
48777< 0.1%
 
48739< 0.1%
 

state
Categorical

Distinct count5
Unique (%)< 0.1%
Missing29
Missing (%)0.1%
Memory size156.2 KiB
NSW
10200
VIC
4541
QLD
4262
New South Wales
 
485
Victoria
 
480
ValueCountFrequency (%) 
NSW1020051.0%
 
VIC454122.7%
 
QLD426221.3%
 
New South Wales4852.4%
 
Victoria4802.4%
 
(Missing)290.1%
 
2020-09-06T10:50:33.611523image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Length

Max length15
Median length3
Mean length3.411061659
Min length3

country
Categorical

Distinct count1
Unique (%)< 0.1%
Missing29
Missing (%)0.1%
Memory size156.2 KiB
Australia
19968
ValueCountFrequency (%) 
Australia1996899.9%
 
(Missing)290.1%
 
2020-09-06T10:50:33.860844image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Length

Max length9
Median length9
Mean length8.991298695
Min length3

property_valuation
Real number (ℝ≥0)

Distinct count12
Unique (%)0.1%
Missing29
Missing (%)0.1%
Infinite0
Infinite (%)0.0%
Mean7.516376201923077
Minimum1.0
Maximum12.0
Zeros0
Zeros (%)0.0%
Memory size156.2 KiB
2020-09-06T10:50:34.046625image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2
Q16
median8
Q310
95-th percentile11
Maximum12
Range11
Interquartile range (IQR)4

Descriptive statistics

Standard deviation2.824782869
Coefficient of variation (CV)0.3758171216
Kurtosis-0.3215331451
Mean7.516376202
Median Absolute Deviation (MAD)2
Skewness-0.6434549432
Sum150087
Variance7.979398256
2020-09-06T10:50:34.211342image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
8334216.7%
 
9326016.3%
 
10285014.3%
 
7237111.9%
 
1113967.0%
 
611815.9%
 
511305.7%
 
410705.4%
 
129714.9%
 
39034.5%
 
Other values (2)14947.5%
 
ValueCountFrequency (%) 
18074.0%
 
26873.4%
 
39034.5%
 
410705.4%
 
511305.7%
 
ValueCountFrequency (%) 
129714.9%
 
1113967.0%
 
10285014.3%
 
9326016.3%
 
8334216.7%
 

Label
Boolean

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
1
10535
0
9462
ValueCountFrequency (%) 
11053552.7%
 
0946247.3%
 

Interactions

2020-09-06T10:49:47.486678image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:49:48.057063image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:49:48.232464image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:49:48.390453image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:49:48.573704image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:49:48.757566image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:49:49.090072image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:49:49.256335image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:49:49.412588image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:49:49.607050image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:49:49.786000image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:49:49.958182image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:49:50.129742image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:49:50.292149image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:49:50.452477image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:49:50.600833image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:49:50.859363image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:49:51.332534image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:49:51.665138image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:49:51.818945image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:49:52.002772image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:49:52.164114image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:49:52.335259image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:49:52.499902image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:49:52.649714image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:49:52.822922image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:49:52.985666image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:49:53.148006image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:49:53.315826image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:49:53.478160image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:49:53.788031image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:49:53.960737image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:49:54.116729image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:49:54.272936image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:49:54.439796image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:49:54.602374image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:49:54.774209image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:49:54.937081image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:49:55.100473image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:49:55.256686image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:49:55.415028image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:49:55.584913image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:49:56.024214image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:49:56.180428image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:49:56.352263image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:49:56.507939image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:49:56.664156image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:49:56.821354image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:49:56.992527image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:49:57.179388image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:49:57.335644image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:49:57.515663image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:49:57.682552image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:49:57.885604image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:49:58.324519image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:49:58.538510image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:49:58.748647image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:49:58.984922image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:49:59.218727image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:49:59.437387image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:49:59.633394image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:49:59.813974image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:00.006792image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:00.222947image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:00.426064image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:00.616953image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:01.001346image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:01.193866image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:01.413030image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:01.750412image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:01.930959image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:02.145097image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:04.989409image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:05.192533image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:05.393625image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:05.618984image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:05.834170image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:06.038712image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:06.413625image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:06.603403image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:06.806437image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:07.019646image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:07.213321image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:07.421238image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:07.754535image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:07.969702image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:08.153590image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:08.341054image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:08.520275image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:08.737403image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:09.130324image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:09.314247image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:09.505107image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:09.699768image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:09.894946image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:10.099539image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:10.287145image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:10.486366image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:10.662539image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:10.868022image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:11.082091image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:11.300447image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:11.692332image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:11.885879image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:12.081354image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:12.287948image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:12.472641image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:12.659420image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:12.857646image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:13.065440image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:13.268574image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:13.477503image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:13.802604image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:14.020994image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:14.427175image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:14.614103image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:14.788701image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:14.987780image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:15.178642image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:15.381684image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:15.585560image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:15.772990image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:15.994763image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:16.205677image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:16.436063image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:16.665626image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:17.099384image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:17.320753image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:17.541164image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:17.751145image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:17.955815image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:18.149655image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:18.362085image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:18.565197image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:18.784409image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:18.993778image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:19.187483image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:19.393967image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:19.969492image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:20.170044image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:20.358361image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:20.574782image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:20.783878image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:20.995131image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Correlations

2020-09-06T10:50:34.430636image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2020-09-06T10:50:34.744664image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2020-09-06T10:50:34.994606image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2020-09-06T10:50:35.295375image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2020-09-06T10:50:35.599536image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2020-09-06T10:50:21.889434image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:22.948023image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:24.063679image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-09-06T10:50:24.525434image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Sample

First rows

transaction_idproduct_idcustomer_idtransaction_dateonline_orderorder_statusbrandproduct_lineproduct_classproduct_sizelist_pricestandard_costMarginproduct_first_sold_datefirst_namelast_namegender_encodedpast_3_years_bike_related_purchasesDOBAgeAge Scalejob_titlejob_industry_categoryjob_industry_category_encodedwealth_segment_encodeddeceased_indicatorowns_car_encodedtenureaddresspostcodestatecountryproperty_valuationLabel
0263883342017-04-070.0ApprovedSolexTouringmediumlarge2083.94675.031408.912013-09-16JephthahBachmann0591843-12-211501.0Legal AssistantIT-50N020.0833 Luster Way4005.0QLDAustralia8.01
1904412342017-02-130.0ApprovedWeareA2BStandardmediummedium1231.15161.601069.552004-08-17JephthahBachmann0591843-12-211501.0Legal AssistantIT-50N020.0833 Luster Way4005.0QLDAustralia8.01
2169350342017-02-140.0ApprovedNaNNaNNaNNaN1034.17NaN1034.17NaTJephthahBachmann0591843-12-211501.0Legal AssistantIT-50N020.0833 Luster Way4005.0QLDAustralia8.01
31929165342017-09-190.0ApprovedWeareA2BStandardmediummedium1807.45778.691028.762015-05-21JephthahBachmann0591843-12-211501.0Legal AssistantIT-50N020.0833 Luster Way4005.0QLDAustralia8.01
41208313342017-07-230.0ApprovedSolexStandardmediummedium1163.89589.27574.622016-07-09JephthahBachmann0591843-12-211501.0Legal AssistantIT-50N020.0833 Luster Way4005.0QLDAustralia8.01
5979260342017-06-251.0ApprovedGiant BicyclesStandardhighsmall1977.361759.85217.512011-08-24JephthahBachmann0591843-12-211501.0Legal AssistantIT-50N020.0833 Luster Way4005.0QLDAustralia8.01
6110715342017-08-220.0ApprovedNorco BicyclesStandardlowmedium958.74748.90209.842005-12-07JephthahBachmann0591843-12-211501.0Legal AssistantIT-50N020.0833 Luster Way4005.0QLDAustralia8.01
710398342017-07-011.0ApprovedSolexRoadmediumsmall1703.521516.13187.392011-04-16JephthahBachmann0591843-12-211501.0Legal AssistantIT-50N020.0833 Luster Way4005.0QLDAustralia8.01
81780896342017-04-101.0ApprovedWeareA2BRoadlowsmall1172.781043.77129.012002-10-10JephthahBachmann0591843-12-211501.0Legal AssistantIT-50N020.0833 Luster Way4005.0QLDAustralia8.01
91576031682017-03-180.0ApprovedTrek BicyclesStandardmediumlarge2091.47388.921702.552010-11-05ReggieBroggetti08NaN1200.8General ManagerIT-50N1NaN16 Golf View Center3020.0VICAustralia6.01

Last rows

transaction_idproduct_idcustomer_idtransaction_dateonline_orderorder_statusbrandproduct_lineproduct_classproduct_sizelist_pricestandard_costMarginproduct_first_sold_datefirst_namelast_namegender_encodedpast_3_years_bike_related_purchasesDOBAgeAge Scalejob_titlejob_industry_categoryjob_industry_category_encodedwealth_segment_encodeddeceased_indicatorowns_car_encodedtenureaddresspostcodestatecountryproperty_valuationLabel
1998766713112502017-03-210.0ApprovedGiant BicyclesStandardmediummedium230.91173.1857.732006-11-10JacklynKewley0422001-11-02 00:00:00180.12Help Desk TechnicianManufacturing1-1N01.0795 Arapahoe Hill4818.0QLDAustralia7.00
199883025972452017-12-181.0ApprovedSolexStandardmediumlarge202.62151.9650.662016-03-29NoellGrahlmans062001-09-26 00:00:00180.12Associate ProfessorFinancial Services-30N01.007227 Hoard Terrace3500.0VICAustralia1.00
1998913374977512017-08-221.0ApprovedSolexStandardmediumlarge202.62151.9650.662016-03-29AmieDufty0412001-10-31 00:00:00180.12Business Systems Development AnalystFinancial Services-30N01.05 Dahle Trail2117.0NSWAustralia10.00
199909764974222017-02-100.0ApprovedSolexStandardmediumlarge202.62151.9650.662016-03-29VitoNorker1782002-01-06 00:00:00180.12NaNManufacturing10N01.0509 Fisk Hill2031.0NSWAustralia11.00
199914712567512017-04-270.0ApprovedOHM CyclesStandardmediummedium183.86137.9045.961997-10-04AmieDufty0412001-10-31 00:00:00180.12Business Systems Development AnalystFinancial Services-30N01.05 Dahle Trail2117.0NSWAustralia10.00
1999253945614022017-02-141.0ApprovedOHM CyclesStandardmediummedium183.86137.9045.961997-10-04HillierAndraud1582001-12-08 00:00:00180.12Assistant ProfessorTelecommunications5-1N01.042829 Charing Cross Road3107.0VICAustralia8.01
1999314236015192017-08-181.0ApprovedSolexStandardmediummedium71.4953.6217.872004-09-28MarwinJeyness1352001-11-30 00:00:00180.12Administrative Assistant IVTelecommunications51N11.07 Bartillon Circle2260.0NSWAustralia8.00
199941127424422017-06-111.0ApprovedSolexStandardmediummedium71.4953.6217.872012-12-02LincVedyasov122001-10-06 00:00:00180.12NaNFinancial Services-3-1N01.03 Sutteridge Park4074.0QLDAustralia6.00
1999573896112502017-11-251.0ApprovedOHM CyclesStandardlowmedium71.1656.9314.232015-06-17JacklynKewley0422001-11-02 00:00:00180.12Help Desk TechnicianManufacturing1-1N01.0795 Arapahoe Hill4818.0QLDAustralia7.00
1999610481927592017-08-161.0ApprovedOHM CyclesRoadhighlarge12.017.214.801999-06-23MelodeeHendrik0162001-11-14 00:00:00180.12OperatorHealth-40N11.068111 Bartillon Court3995.0VICAustralia3.01